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Abstract: To detect changes in the mean of a time series, one may use previsible detection procedures based 
[ on nonparametric kernel prediction smoothers which cover various classic detection statistics as special cases. 

p I I Bandwidth selection, particularly in a data-adaptive way, is a serious issue and not well studied for detection 

(— I ' problems. To ensure data adaptation, we select the bandwidth by cross-validation, but in a sequential way lead- 

' ing to a functional estimation approach. This article provides the asymptotic theory for the method under fairly 

weak assumptions on the dependence structure of the error terms, which cover, e.g., GARCH(p, q) processes, 
by establishing (sequential) functional central limit theorems for the cross-validation obje ctive function and 
the ass ociated bandwidth selector It turns out that the proof can be based in a neat way on 



in 
o 

(N 



>< 



Kurtz and Prottei 



(Il99a) 's results on the weak convergence of Ito integrals and a diagonal argument. 
' Our gradual change-point model covers multiple change-points in that it allows for a nonlinear regres- 

sion function after the first change-point possibly with further jumps and Lipschitz continuous between those 



discontinuities. 

in 

, In applications, the time horizon where monitoring stops latest is often determined by a random experiment, 

e.g. a first-exit stopping time applied to a cumulated cost process or a risk measure, possibly stochastically de- 
pendent from the monitored time series. Thus, we also study that case and establish related limit theorems in 



the spirit of lAnscombd (Il952l) 's result. This is achieved by embedding the stopped processes into a sequence 



?— ( ■ of processes, which allows us to handle the randomly determined time horizon as a random change of time 

■ 

■ ■ problem. The result has various applications including statistical parameter estimation and monitoring financial 

investment strategies with risk-controlled early termination, which are briefly discussed. 
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Risk control; Stochastic integration; Time series. 
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1 INTRODUCTION 

Assuming a nonparametric regression model with non-vanishing mean, we study the sequential asymp- 
totic distribution theory of the cross-validated bandwidth selector for a previsible kernel detection 
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statistic. The nonparametric regression model assumes that the mean of a process observed in dis- 
crete time is given by an unknown function belonging to some infinite dimensional function space, 
providing an attractive framework for statistical estimation as well as detection problems. Estimation 
and inference based on an observed (large) sample has been extensively studied in the literature. For 
an overview on this topic and the most commo n methods such as keinel est ir nators, local po l ynomi 



Hardle 



als, sm oot hing splines and wavelet s, we refer to lDonoho and JohnstonI (119941) . lEubankl (119881) . 
(1199 ih and lWand and JonesI (119951) and the references given therein. 

However, often the data arrive sequentially and interest is in detecting changes in the mean func- 
tion, for instance that the mean is too large or too small. There is growing interest in sequential 
methods due to their importance for the analysis of data streams in areas such as finance, environe- 
metrics and engineering. Procedures based on nonparametric smoothers form an attractive c lass o f 
methods, which has received substantial interest in the literature; we refer to 



Wu and Chu 



MuUer and StadtmuUed (Il999h . 



Stelandl ( 2005 ) and 



1993h . 

Stelandl (120101) . amongst others. Now invariance 



principles provide a neat way to obtai n distributional approximations under weak assumptions. For a 
discussion of that approach we refer to IStelandl (l2010l) . There is also a rich literature on the estimation 



of regression funct ions that are smooth exc e pt som e discontinuity (change -) points. See, for example. 



the recent work of 



Gijbels and Godemiauxl (l2004h or 



Antoch et. al. 



(I2007h . 



The problem how to select the bandwidth or, more generally, tuning constants, is a serious is- 
sue, which is not well understood for the sequential detection problem. Her e, asymptotic results fo r 
methods such as the CUSUM, MOSUM or EWMA pro cedures, studied bv Han and Tsunj (12004) . 

J2008k 



Brodskv and Darkhovskv 



AueetaL 



(120081) . and iMoustakidesI (120081) amongst many others, 
assume that the bandwidth parameter may depend on some size parameter, usually the time horizon or 
maximal sa mple size, but it is c hosen in a non-stochast i c (det erministic) way; a notable exception is 
the work of lSpokoinyl (1 1998b and lSpokoiny and Polzehll (120061) . For a novel consiste nt approach to the 



Golvandina et al. 



selecti on of such tuning c onstants using singular value decomposition methods see 
(120121) . In lStelandl (120101) we proposed to use cross-validation, a general technique widely employed, 
in a sequential way to a sequence of kernel prediction statistics related to the well known Nadaraya- 
Watson estimator. Using a sequence of prediction statistics forming a previsible process and being 
close to the Nadaraya- Watson estimator has the advantage that detection is based on a control statistic 
which can be interpreted as a one-step ahead predictor providing sequential approximations to the 
process mean. Contrary, various classic methods for change detection lack that nice interpretation. 

In the sequential approach, the cross-validated b andwidth is, in principle, calculated at each time 
point thus yielding a functional bandwidth estimate. IStelandl (120101) establishes uniform laws of large 
numbers for the corresponding objective function as well as consistency theorems for the bandwidth 
estimate. To guarantee wide applicability, those results are proved for i.i.d. data as well as for L2- 
NED processes. The present article focuses on the relevant asymptotic distribution theory in the sense 
of weak convergence. We provide sequential functional central hmit theorems for the cross-validation 
criterion as well as for the functional estimate of the cross-validated bandwidth. An important tool for 
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our theoretical results is a general result on the weak convergence of Ito integrals for integrators which 
are semimartingales. We show that the results hold true under a weak a-mixing condition which is 
satisfied by many processes, e.g. many linear p rocesses, which are also known to be 5-mixing, a 
class of processes for which iBerkes et. al.l (l2009h recently established a strong invariance principles 



fo r the classic sequential ern pirical process. For related result s for lo ng-memory processes we refer 



to 



Dehling and Taqq uI (119891) and the work of 



Doukhan et. al. 



(120050 which al lows for a weakly de- 



pende nt nonlinear Bernoulli shift component. Further results can be found in iDehling and Mikosch 
J2OO2I) . 

In a first step, we assume that the observations arrive sequentially until a non-random time horizon 
T — )• 00 is reached. By rescaling time to the unit interval, the Skorohod spaces of right-continuous 
function with left-hand limits provide an appropriate framework to establish a weak limit theory. 
However, in certain applications the time horizon is not fixed but determined by a parameterized 
random experiment such as a family of random first exit stopping times. The question arises under 
which conditions on those stopping times the stopped proces s inherits the asym ptotic distribution. 
Results of this type can be traced back to the seminal work of lAnscombd (119521) . which studied the 
large-sample theory of randomly stopped stochastic processes in discrete time. Thus, in a second 
step, we show that in our framework an embedding argument allows us to interpret the randomly 
selected time horizon as a random change of time problem leading to a Anscombe-type theorem. We 
assume the same condition on the family of random indices replacing the time horizon T as imposed 
by Anscombe. 

The sequential setup is as follows: We assume that observations = Yxn, ^ ^ n < T, arrive 
sequentially until the maximum sample size T is reached and satisfy the model equation 



Yn = m{Xn) + er 



n 



1,2, 



(1.1) 



with 



m{xn) = mo{xn) + d{Xn)/VT. 



(1.2) 



The time horizon T is assumed to be non-random and large; it will converge to 00 in our limit theo- 
rems. Extensions to random time horizons are discussed in Section [51 The function mo is assumed to 
be known. 6 is a bounded and piecewise Lipschitz continuous function on [0, 00) with at most finitely 
many jumps, either 6 > or 6 < 0, and such that 

qi = inf{s > : 6{s) / 0} > 7 



for some 7 G (0, 1). In detection, the primary goal is to detect changes from an assumed model, 
mo, for the process, also called in-control model or null model. The departure from that in-control or 
normal behavior is modeled by the function 5. Of particular interest is the detection of the first change 
point qi. When (5 is a smooth function, the above model is also called gradual change model, since 
then the process mean smoothly drifts away from the assumed in-control behavior. But because we 
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allow for 5-functions with jumps, (11.21 ) is very general and covers the case that there are many change- 
points where the mean changes abruptly, e.g. when S is a. step function representing a finite number 
of level shifts. Hence, we treat a large class of change-points models in an unified way. We consider 
a sequence of local alternatives converging to the null model at the rate T^^/^, which will allow us 
to establish weak limits for the quantities of interest providing a means to study local performance 
properties, e.g. by simulating from the limit process. Of substantial interest is the detection of the 
first change-point of 6 after some initial time instance sq where the monitoring procedure starts, i.e. 
inf{s > So : 6{s) > 0}, respectively. 
For the regressors {x^} a fixed design 



G-\n/T), l<n<T, 



induced by some design distribution function G is assumed. In many applications one can assume 
that G is known or chosen by the statistician. Examples cover biostatistical dose-response studies, ap- 
plications in communication engineering with equidistant sampling as well as lab oratory experim ents 



Steland 



( 2010 ). For 



where the design points are selected according to some external criterion, cf. also 
simplicity of our exposition, we will assume that G = id, since otherwise one may substitute mo by 
mo o G^^ and 6 by S o G^^. The term 5t = S/VT in (11.11) represents the local alternative model 
describing the departure from niQ. 

Our results work under the weak assumption that the errors {et} form a strictly stationary martin- 
gale difference sequence satisfying a classic condition on the strong mixing coefficients. Some of our 
results even hold true for stationary martingale difference sequences without additional assumptions. 

The organization of the paper is as follows. In Section |2] we introduce the sequential cross- 
validation approach. Section [3]provides some basic notation and preliminaries as well as an exposition 
of a result on the weak convergence of Ito integrals, which we shall use to prove the results. The main 
asymptotic results are given and proved in Section |4] Section [5] discusses the extension to random 
time horizons, its relationship to Anscombe's classical result and our Anscombe-type result based on 
a random change of time argument. 



2 SEQUENTIAL CROSS-VALIDATION 

The statistical idea of cross-validation is to choose nuisance parameters such as tuning constants 
controlling the degree of smoothing of a statistic in a data- adaptive way such that the corresponding 
estimates provide a good fit on average. Let us define the sequential, i.e., Ti-i = (riYj : 1 < i < 
i — l)-measureable prediction estimate 

1 '"^ 

mh,., = N^^_^- K{[j-i]/h)Yj, i=lTj\,lTj\+l,... (2.1) 
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where Nx-i = h ^ Sj=|t7J -^(0 ~ 0/^) ^^^^ 7 G (0, 1) is an arbitrary small but fixed constant. K 
is a kernel function such that 

K G Lip([0, 00); [0, 00)), \\K\\oo < 00 and > 0, (2.2) 

where Lip([0, cxd); R) denotes the class of Lipschitz continuous functions on [0, 00). We assume that 
the bandwidth /i > is a function of the time horizon T in such a way that 

\T/h - = 0(l/r) (2.3) 

for some constant ^ G (0, cxd). Imposing the convergence rate T^^ rules out artificial choices such as 
h = T/{^ + T^"!), 7 > 0, leading to arbitrary slow convergence. 

To this end, let Fn be the natural filtration associated to {e„}. Substituting h in fhh^-i by a row- 
wise J^j-adapted array h^^, [Tso\ < i < T, T > I, of non-negative random variables yields again 
an adapted array {m/ij, _i} to which we apply one-sided detection procedures given by the first exit 
stopping times 

5+ = inf{ [soT\ <i<T: m^^^^^.i > c}, (2.4) 
= inf{[soTj <i<T: fhh^^^^^i < c}, (2.5) 

respectively, where sq > 7 determines the start of monitoring. Given the predictions rhh^^i, we may 
define the sequential leave-one-out cross-validation criterion 

. VTs\ 

CVs{h) = CVrAh) = - {Yi-mh,-i)\ h>0, 

i=[Tsoi 

a function of the candidate bandwidth h. In the functional cross-validation bandwidth approach the 
cross-validation objective function is minimized for each s G [soi 1]- To do so, let 'Hso,s, be the family 
of all arrays {hm ■ [soT\ < n < T, T > 1} with 

max \T/hTn - CI = 0(1/T) for some £, > 0. 

l<n<T 

We consider minimizers {h^^} G "Hsq,^ of the cross-validation criterion such that 
CV,,/T{h*Tj < CV^/AhTn), lsoT\ < n < T, T > 1, 
for all {/iTn} S ^so,€- This leads to the functional cross-validated bandwidth estimator 

h^{s) = /ir,[TsJ' « S [so, !]• 
Notice that, by definition, {h^{s) : s G [so,!]} is J^ltsJ "Adapted. 



Stelandl ( 20101) showed that. 
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under regularity assumptions, CVr^sih) converges to some function CV^{s) which depends on ^ = 
limT/Zi. That resuh is valid for stationary a-mixing series and for L2-near epoch dependent time 
series. For i.i.d. error terms one even achieves the usual 0(1/ \/T) rate of convergence in the sense of 
L2 convergence. Having those results in mind, we now address the related weak convergence theory. 

Since in practice the cross-validation criterion has to be minimized numerically, one may assume 
that minimization is done over a finite grid of values, and we shall provide a weak convergence result 
for the cross-validated bandwidth under such an assumption. Further, conducting cross-validation at 
each time point can be infeasible in a practical application, such that one has to select A*" time points. 
So < si < ■ ■ ■ < SN, where the cross validation criterion is numerically minimized, thus yielding an 
adapted sequence h* = h^{si),i = 1, . . . , A^. The cross- validated bandwidth h* is then used during 
the time interval [sj,Si+i), i = 1, . . . , A^. Clearly, the corresponding cross-validation bandwidth 
estimator is now the step function 

^TAf(s) = hxisi), s £ [si,Si+i), i = 1, . . . ,iV - 1. 

In such a situation, it is sufficient to know the convergence of the finite dimensional distributions (fidi 
convergence). 



3 PRELIMINARIES AND WEAK CONVERGENCE OF STOCHAS- 
TIC INTEGRALS 



Since Lt and Qt are random cadlag functions, it is in order to recall some basic facts on the Skorohod 
spaces D{[a,b];M}), I an integer, consisting of those functions [a,b] — )• M', a, 6 G M, being right- 
continuous with existing limits from the left. Let V{f) denote the (total) variation semi-norm of 
a function / and ||/||oo its supnorm. For a random variable X we denote by \\X\\p the Lp-norm, 
p G [0,00). The space L'([a,6];M') can be equipped with the following Skorohod metric. For two 
functions /, g on [a, b] with values in define 



d{f,g) = inf max{||/ o A 



|A - id lloo}, 



where A is the set of all strictly increasing continuous mappings A : [a, 6] — [a,b]. Clearly, 
d{f,g) < 11/ — S'lloo such that uniform convergence implies convergence in the Skorohod metric. 
Weak convergence of a sequence {X,Xn} of random functions taking values in D( [a,b];W) now 
means weak convergence of the measures Px„ to Px, as n — )• 00, denoted by X„ =^ X, n — )• 00. For 
the sake of clarity of exposition, we sh a ll also wr i te Xn ( u) ^ X(u), as n — >• 00. Further detai ls can 
be found in lsickel and Wichural Jl97lklNeuhausl Jl97lklstrafl Jl972l) and lseiio and SenI J2OI1I) . 



The framework for the weak convergence r esu lt for Ito i ntegra ls is as follows. Let us first recall the 



definition of the Ito integral, cf. IProtterl fcOOSh or 



Steland 



(l2012h . Let {Hn} and {X^} be sequences 



of adapted processes on a probability space {Q,T,P) which is equiped with a sequence {Tn} of 
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filtrations Tn = {^nt ■ t E 1} with index set /, i.e. X„ are J>it-adapted such that Hn{t), Xn{t) 
are J>it-measureable, t G /. In general, a process X is called a semimartingale, if X = M + ^ for 
some local martingale and a process A having bounded variation. Given a semimartingale X and a 
predictable cadlag process i^, one may define the stochastic Ito integral 



J HdX = S^J^ H{s-) dX{s) : t G /| 



When we equip the space L(/;M) of left continuous functions possessing right-hand limits with the 
topology induced by the uniform convergence on compact sets, the linear operator /(•) = J ■ dX 
is continuous on L(/;M), such that uniform convergence Hn H on compact sets of a sequence 
{H, Hn} of such adapted processes implies convergence of the Ito integral, in probabiUty, and there- 
fore also weakly. The following result extends the latter fact to the much more involved case that the 
integrator depends on n. 

Theorem 3.1. (Kurtz and Protter, 1996) 

Suppose that Xn is, for each n G N, a J^nfCidapted semimartingale with Doob decomposition X^ = 
Mn+An such that supfiVar{Xn)+V{An) < oo, and H^ is Tnt-pf^dictable. If{Hn,Xn) =^ {H,X), 
n — > oo, in the Skorohod space D{[a,b];R'^), then 



I , Xji , J HndXn^ (^H,X,J HdX^ , 



as n oo, in D{[a, b]; 

We will apply that result to the following framework. Assume that the e„ are defined on a common 
probability space {Q,,T,P) which we equip with a sequence of filtrations Tnt- For simplicity, one 
may consider the natural filtrations Tnt = : ^ < [nt] ),t e [so,l],neN,m what follows, but the 
results hold true for any sequence of filtrations such that e„ is Tn = Tn,i adapted. Our assumptions on 
6 ensure that t ^ J^q 5 dX, A denoting Lebesgue measure, exists and defines a function of bounded 
variation. 

Lemma 3.1. Suppose that {et} is a Tn-fnartingale difference sequence under P. Then the partial 
sum process 

lTu\ 

STiu) = T-^/^^Yi, ue[0,l],T>l, (3.1) 

i=l 

defines a sequence of semimartingales. If, additionally, {et} satisfies an invariance principle, i.e. 

\Ts\ 
i=l 
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as T ^ oo, in D([0, for some constant a G (0, oo) and Brownian motion B, then 



= / 6dX + aB, 



(3.2) 



as T -^oo, in D{[0, 

Proof. Notice that 5"^ attains the decomposition St{u) = T~^/^ Y^i=i^ + -^riu), where the first 
term is a martingale and At{u) = T^^/^ YaL^i ^i^i) = EHi^ K^/T) is non-random. But 
due to Koksma's theorem, 



At{u) - I 5{z) dz 



-1 



< V{5)T 

where the upper bound is independent from u. The variation of the step function At{u) is T^^ Yli=i I'^CV^)!' 
which converges if 6 is piecewise Lipschitz with a finite number of finite jumps, and is therefore 
bounded in T > 1. Hence, St is a semimartingale. □ 

We shall impose mixing conditions on the innovation process {et : t G Z}, which is assumed to 
be indexed by the integers. Recall that {et} is called a-mixing, if a{k) = o(l), as A; — )• oo, where for 

A; G No 

a{k)= sup \P{AnB) - P{A)P{B)\ 

denotes the a-mixing coefficient and 7"^ = cr(et : a < t < 6) for —oo < a < 6 < oo. {e^} is called 
cj)-mixmg, if (j){k) = o(l), as k ^ oo, where 



sup \P{A\B) - P{A)\ 



One can check that a{k) < cl){k), see lPoukhanl (|1994h or 



Athreva and Lahiri 



mm- 



4 ASYMPTOTIC THEORY FOR SEQUENTIAL CROSS-VALIDATION 

We shall now study the weak convergence theory of the sequential cross-validation bandwidth proce- 
dure. 

Let us first identify the random processes which we have to investigate. Notice that for any 
s G [sq, 1] and > we have 

[TsJ [Ts\ [TsJ 

i=LTsoJ i=[Tso\ i=[T so\ 

such that minimizing CVs{h) is equivalent to minimizing the random function 

Ct,.(/i) = Lt(s) + Qt(s), 
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on which we shall focus in the sequel. Here the cadlag processes {Lt{s) : s £ [sq,1]} and {Qt{s) 
s S [sO) 1]} are defined by 



2 ^^'^ 

Lt{s) = -j; J2 ^i^h-u 
i=[Tso\ 

Qt{s) = ^ X] ^h-i, 
i=\Tso\ 

for s G [sO) 1]; for our study it will be convenient to omit the h in the notation. 

We shall see that Lt and Qt have different convergence rates, Qt being the leading term which 
determines the asymptotics of CT,s{h) for large T. After scaling appropriately their weak limits turn 
out to be functionals of the process 

= I 5dX + aB (4.1) 



which appears as the limit of the partial sum process of the observations, Yn = Yxn, confer Lemma lTT] 
Recall that mo + S/VT is the regression function after the (first) change-point. Thus, the limit the- 
orems show the effect of a general departure from the no-change model mo given by the function 6, 
which appears as the drift in the semimartingale (14. It . 

As already mentioned in the previous section, we shall impose weak conditions on the a-mixing 
coefficients of the innovation process {et} of martingale differences. Indeed, those conditions are nat- 
urally satisfied by many time series studied in the literature. As an example, consider the GARCH{p, q 
model given by 

p g 
i=i i=i 

where {^t} are i.i.d.(0, 1) random variables, apjiq / 0, a > and /3j > for z = 1, . . . ,j? and j = 
1, . . . ,q. It is known that a strictly stationary GARCH(p, q) process is (/i-mixing with geometrically 



decreasing (/)-mixing coefficients, if attains a Lebesgue density, cf. iDoukhanI (119941) . This implies 
geometrically decreasing a-mixing coefficients, which in turn implies that the conditions imposed in 
the results of the present section on the a-mixing coefficients are satisfied. 

4.1 The Process Qt 

Let us start our theoretical investigation with the more involved process Qt- 

Theorem 4.1. (/) Suppose that {e^} is a mean zero stationary martingale difference sequence 
which satisfies an invariance principle. Then, thefidi convergence of the process T^Qt i^ given 
by 

T\Qt{si), Qt{sn)) diag (^j^^ G^'iv) dB^v)^ 



N 
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as T ^ oo, where J G^{v) dB^{v) is the process 

N 

'7 / i=l 

for fixed time instants sq < si < ■ ■ ■ < sn <1. Here 



10 

and 

rs 

-2 



fU 

B^{u)= 6{t)dt + aB{u), uG[0,1], (4.2) 

^0 



g^''{u)= I D{Cu,Cv,^w)N-'{w)dw, u,v e [0,w], s e [so,l], (4.3) 

where 
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I K(w — u)K(w — v), u,v,w € [0, oo), u,v < w, 

D[u,v,w) = < (4.4) 
I 0, otherwise, 

/•w 

N{w)= K{^{w-z))dz, -^£[7, 1]. (4.5) 

J-y 

(ii) Let {e„} be a strictly stationary martingale difference sequence with E{ei) = 0, E{e^) < oo 
for some 5 > and a-mixing coefficients, a{k), satisfying 

oo oo 

< oo, 

k=0 k=l 

for some C E (0, 1). Then the process {Qris) '■ s £ [sq, 1]} is tight and therefore converges 
weakly. 

Proof. Denote by St the partial sum process introduced in Lemma 13.11 Either by the assumption 
stated in (i) or under the moment and mixing conditions imposed in (ii), we have the weak convergence 



pu 

St{u) =^ B^ {u) = / 5{t) dt + aB{u) 
Jo 



as T — oo, since we may apply iHermdorfl (119841 Corollary 1) with /3 = 4 under condition (ii). In- 
deed, the conditions on the mixing coefficients are stronger than required there and E{J27=i ^i?' 1^ — 
Eef < oo holds true for any strictly stationary martingale difference sequence {e^}. We shall now 
apply the Skorohod representation theorem which asserts that on a new probability space equivalent 
versions of the processes {St{u) : u G [sq, 1]} and {B'^{u) : u G [sq, 1]} can be defined, which we 
will again denote by St and B'^, such that 

llS'r - S^lloo 0, a.s., 



10 



as T — )• oo. Let us consider the quadratic form Qt{s). Notice that 



J_ D{j/h,k/h,i/h) Y, Yk 



where the function D : [0, oo)'^ — ;> [0, oo) is defined in (I4.4I ). Using the fact that 

^ LTsJ-l „lTs]/T-l/T 

- y K{i/h-j/h)K{i/h-k/h)= K{[Tx\/h-j/h)K{[Tx\/h-k/h)dx, 

T , , J\Tsn\IT 



i=\Tso\ 



we may represent T'^Qt{s) via (Ito) integrals, namely 

T''Qt{s) 

\Ts\/T i-VTs\/T i-[Ts\/T 

/ / D{uT/h,vT/h,[Tw\/h)N^^{w)dST{u)dST{v)dw 

[Tsoi/T J[T-,]/T J[T-y}/T 
lTs\/T rlTs\/T /-[Tsl/T 

/ / D{uT/h,vT/h,[Tw\/h)N^^{w)dwdSTiu)dSTiv), 

LT7J/T JlT-,]/T J\Tso\/T 

where 

1 \Tw\-l .[Twl/T-l/T 

Nt{w) = - V K{[Tw\/h-j'/h)= K{[Tw\/h-[Tz\/h)dz, w>0. 

(4.6) 

The first step will be to apply Theorem 13.1 I to obtain weak convergence of the inner Ito integral. The 
second step, a diagonal argument, will then yield the fidi convergence. Lastly, we verify tightness 
under the conditions given in (ii). Clearly, we expect that Nt{w) converges to the function N{w) = 
J"^ K{^{w — z)) dz, w G [so, 1]. Since for w > 7 we have N{w) > Ni^j) > and, of course, 

sup \Nt{w) - N{w)\ < V{K)T~^ (4.7) 

i«G[LT7j,l] 

by virtue of Koksma's theorem, yielding \N^'^{w) — N^'^{w)\ — )• 0, as T — )• 00, uniformly in 
w G [7, 1]. Fix s > So and v. Define for u G [0, 1] 

ATs\/T 

gY{u)= / D{uT/h,vT/h,[Tw\/h)N;f^{w)dw, 

J\Tl\/T 

g'"'\u)= I D{^u,^v,^w)N-'^{w)dw. 

J-y 
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For what follows, we need to verify that g^^ — )• g^'^ in the uniform topology, and that g^^ has 
uniformly bounded variation. Clearly, \gl^^{u) — g^'^{u)\ can be bounded by 



rlTs\/T 

0(r-i)+ / \D{uT/h,vT/h,[Tw\/h)N-^{w) - D{Cu,Cv,^w)N-^{w)\dw. 



Let At = {{u, v, w) : [T7J < u, v, [Tw\ /h < [Ts\ /T}. On the set At the above integrand equals 
\K{ [Tw\ /h - uT/h)K{ \Tw\ /h - vT/h)Nj^'^ {w) - K{i{w - u))K{C{w - v))N~^ {w)\. Recall the 
fact that for sequences of mappings {a, ar}, &t} taking values in some normed space with norm 
II • ||, we have a^ftT fg, as T — ;> 00, provided qt ^ a,bT ^ b and ||o||, supy>i ||6t|| < 00. Apply 
that result with aT{u,v,w) = K{[Tw\/h - uT/h)K{[Tw\ /h - vT/h), a{u,v,w) = K{i{w - 
u))K{^{w — v)), hT{w) = N^^{w) and b{w) = N^^{w). By boundedness andLipschitz continuity 
of K and due to (I4.7I ) we may conclude that 

sup \g-/(u)-g-'^{u)\^0, (4.8) 

as T — cxd; that convergence is even uniform in s G [sq, 1]- Before proceeding, let us check that g^'^ 
is of uniformly bounded variation, such that the uniform Umit g'"'^ is of bounded variation as well. 
Clearly, g^^ is a step function with jumps at k/T, k = [Tjl /T, . . . , [Ts\ — 1, of size not larger than 
r~^||K||^/A^(7)^ in absolute value. Thus, for any partition {Ci}, arbitrary s G [sq, 1] and v < w, 
the variation J2i \9T^{Ci+i) ~ be bounded by ||i^||^/A^(7)^, yielding 

sup supV{g^^) < 00. (4.9) 
By (14.81) . we may conclude (take A = id) that, for fixed v,s, 



inf max 

AgA 



^{St o A(-) - o A(-))2 + {g^ o A(-) - g^'^ o A(-))^ 



00 



|A - id lloo I = 0(1), 
(4.10) 

as T — 00, a.s., where the o(l) is even uniform in u, v and || • ||oo denotes the supnorm over [7, 1]. This 
means, d{{g^^ , St), (5^''*, B^)) — 0, as T — 00, a.s., which, of course, implies weak convergence 
by virtue of the second half of the Skorohod/DudlyAVichura representation theorem, i.e. 

(5;•^5T)^(<7"'^i?J), 

as T — )• 00, in the Skorohod space D{[^, 1]; M^). We may apply Theorem 13. II to conclude that 

{g:i^\ST,W^'')^{g^'^BlW-^n, 
in L'([7, 1]; IR-^), as T — )■ 00, for the equivalent versions, where the processes {W^''^{t) : t G 
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[7, :t G [7, 1]}, r> 1, are defined by 



The second step is a diagonal argument: Fix G N and points si, . . . ,sn G [sO) 1] witli si < • • • < 
SAT. Putfor T > 1 

Gt{v)= il{v<si} / 5T'''Wf^'S'T(n),...,l{„<sjy} / g^'''^{u)dST{u)] , 



Let us check that d{GT, G) = o(l), as T — )• 00, where d denotes the Skorohod metric on D{[sq, 1]; M^) 
Consider for i = 1 , . . . , A^, 

r qT dSr - r 9"''' dB^ = 0{T-') + H {g^^ - g^'^^) dB^ + g'/'d{ST - B^). 

JlT^l/T J7 J 7 J 7 

The first integral on the right side converges in probability to 0, as T — )• 00, since our assumptions on 
6 ensure that B^ is a semimartingale. The second integral can be interpreted as a stochastic Stieltjes 
integral, since the integrand is of (uniformly) bounded variation. Using integration by parts, (I4.8l l and 
( I4.9I ). we see that, with || • ||oo denoting the supnorm over [sq, 1], 



r gY^d{ST-Bl 



< 2sup ||5^'^||oo||5't - lloo + ||5't - -B^lloo sup V{g"r^^) 



but the right side converges to 0, as T — )• 00, a.s. Now d{GT, G) = o(l) follows easily. A further 
application of Theorem 13 . 1 1 vields 

G^,St, I G^ dSr^ (g^'^B^, [ G^'dB^ , , 



as r — )• 00, in /^([so, 1]; IK^), where by definition / dBg is the process 

hv<s,} [l^^ 9'"^''i^) dBUn)^ dBUv) : s G [sq, 1]| • 

Now we sample the process f dB^ at the points si, . . . ,sn- Then the diagonal of the N x N 
matrix with ith row given by the vector 

^^G''{v)dBUv), 
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i = 1, . . . , N, equals (Qrisi), • • • , Qt{sn))- Consequently, we may conclude that 

T^iQT{si),...,QT{sN)) ^diag ( r G^{v)dB!{v)) , 

as T — )• cxD, which completes the proof of (i). Let us now verify that under the assumptions given in 
(ii) tightness of T'^Qt follows. Let sq < a < 6 < 1 and notice that due to ([TT]) for [Taj < i < [Tb\ 



E{mh,-i) = -j= 



< 



i-l i-1 

K{{i-j)/h)E{Y,)/Y,K{{i-j)/h) 

j=lT-y\ j=l 

infse[a,fe] n K{^{s - z)) dz + o(l) 



Vf 

= 0{1/VT), 

by positivity of the kernel, where the o(l) terms are uniform in s and i, by virtue of Koksma's theorem. 
We have 



E{T\QT{b)-QT{a)]f = T' ^ E [\[ 

h,...M=\Ta\ \j=l 



When writing , = {[mh,-ij — Efhh^-i^] + Emh-ij)"^, multiplying out and collecting terms, we 
see that only the terms involving 



fhh^.,^ - Efhh^^i^ = J2 - l)/h) Yl ^(fe - (Yi - E{Yi)) 

but not Erhh-ij have to be dealt with, since for p = 1, . . . , 8 

E[mh,-^^ - Emh,-if^''{Emh,-^^r = 0{T-p'^). 
Therefore we can and will assume from now on that EiYj) = 0. For [Taj < ii, . . . , 14 < [T6J we 
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have by non-negativity of K and since Nxiiu/T) > 0{1/T) for z/ = 1, . . . , 4, 



44 — 1 



Ut=iUie{,.MKii,/h-l/h) 
Ut=iNTiiu/Tr 



14 — 1 



01 Mo ^ 



= 0(max(ii,...,i^)7r«) 

Here we used the fact that a strictly stationary sequence {^„} ensuring the impos ed moment and a- 
mbcing conditions satisfies j„ • • • '^j2m)l = 0(n™), for m G N, cf. lYokoyamal (11980 , 

proof of Theorem 1, p. 47) and lKimI ([19931) for the slightly weaker conditions. Thus, 



EiT^[QT{b)-QT{a)])^ = 0{\b-a\^ 
Holder's inequality now ensures that for sq < si < S2 < 1 

E\T^Qt{s) - T^Qt{si)\''\T^Qt{s2) - t2Qt(s)P 



< ^E\T^Qt{s) - T^Qt{si)\^^/E\T^Qt{s2) 
= 0{\s - si\\s2 - s\) 

= 0(|S2-Sl|'), 



T^Qt{s)\'' 



which verifies the criterion 



Billingsley 



( 1968 



Theorem 15.6). 



□ 



4.2 The Process Lt and the Cross-Validation Criterion 

The next theorem provides a functional central limit theorem for the process Lt- 

Theorem 4.2. Let {e„} is a strictly stationary sequence with E{ei) = 0, E{ef) < oo and a-mixing 
coefficients, {a (A;)}, satisfying 

oo oo 

< oo, 



k=0 

for some ( £ (0, 1). Then 



k=l 



TLt{s) Ci:{s) 



Jo K{^{u-v))dv 



(4.11) 
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in D{[0, 1]; K), as T ^ oo. is a.s. continuous. 

Proof. Again, by virtue of the Skorohod/DudleyAVichura representation theorem, we assume w.l.o.g. 
that \\St - -BJ lloo 0, a.s., as T oo. Notice that 

Lt{s) = -- ^ 



2 /-LT.J/T / Y.f:^-'K{{[Tu\-3)/h)Y,/VT \ 
TJlTsoi/T\ T-^j:f:t K{{[Tu\-f)/h) 



leading us to the representation 



TLt{s) = -2 / lT{u)dST{u 

hTso\/T 



with 



ATu\/T-l/T I i-\Tu\/T-l/T 

lT{n)= K{{[Tu\-[Tv\)/h)dST{v) K{{[Tu\ - [Tz\) /h) dz 

■■\Tu\ /T-l/T 

E^{v)dST{v), 



where Nt is defined in (14.61) and 

EJf{v) = K{{[Tu\ - [Tv\)/h)N^^{u), u,v e [so,l],v < u. 

Recall that Nt{s), s E [sq, 1], is not smaller than infsg[sp Jq K{S,{s — z))dz + o(l) which is 
bounded away from 0. As in the proof of Theorem 14. II one can show that for fixed u 

{v^K{{[Tu\ - [Tv\)/h)N:^\u),ST) iv^KiCiu-v)),ST), 
in D{[so, n]; M^), as T — )• cxd, such that Theorem 13. II guarantees that the process 

\v^K{{[Tu\ - [Tv\)/h),STAj^^^ Kii[Tv\ - [Tu\)/h) dSriv) : u' G [so,u]\ j 

converges weakly in /^([sq, n]; M^) to the process 

(v^K{au-v)),ST,{ r K{au-v))dB^{v):u' e[so,u]\] . 



Now we apply the diagonal argument given in the proof of Theorem l4.1l to obtain the fidi convergence 
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of It, 



It{u)^^' I{u) = £ K{C{u-v))dBUu) j £ K{C{u-v))dv, 



as T — oo. To extend that result to weak convergence in D([0, 1];M), it remains to show tightness 
of the process Lt- We may argue as in the proof of Theorem 14.11 Again applying lYokoyamal (|l98C , 
proof of Theorem 1 p. 47), we obtain 



■LTfeJ 

E\ I ET{u)dST{u) 

\Ta\ 



4 [TbJ 4 

f2 E X{ET{t,/T)E{Y,,...Y,, 



n,...M=\Ta\ j=l 



< sup 

XGIR,T>1 



iu...,H = lTa\ 



yxm,T>i V / 

0(\b-a\^). 



Thus, for So < r < s < 1, 

\\lTis)-lT{r)\U = 
Holder's inequality now entails that 



.[TsJ/T 

-2 / ET{u)dST{u] 

'[TrJ/T 



0(|s-r|i/2). 



E\It{s) - /t(si)P|/t(s2) - /t(s)P < ^/E\It{s) - It{si)\WE\It{s2) - /t(s)|4 

= 0(|s - si||s2 - s\) 

= 0{\S2-Si\^), 

thus establishing tightness. We can conclude that 

It^I inI)([so,l];K), 

as T — oo. Again considering equivalent processes on a new probability space, we may assume that 
\\St — -Bjlloo — )• as well as \\It — I\\oo, as T — )• oo. The same argument as used to obtain (14.101 ) 
yields {St, It) (B^, /) in L>([so, l],^^), as T oo. A further application of Theorem ITTlvields 

{St,It,TLt) = {^StJt, j LTdST^ (B^JXa), 

in L>([so, 1]; M'^), as T — )• oo, which completes the proof. □ 
We may now easily combine the results of Theorem 14. H and Theorem 14.21 Since the convergence 
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rates of Qt and Lt differ, the asymptotic distribution of TCt,s is dominated by the process TLt- 

Theorem 4.3. Suppose that {e„} is a strictly stationary martingale difference sequence with E{ei) = 
0, E{e\) < oo and a-mixing coefficients satisfying 

oo oo 

< oo, 

fc=0 fe=l 

for some C £ (0, 1). Then the cross-validation objective function, CT,s{h) satisfies a functional central 
limit theorem, 

TCT,s{h) C^{s), 
as T ^ oo, in the space D{[0, 1]; M), where the process is as in ( 14.771 ). 

4.3 The Cross-Validated Bandwidth Process 

To simplify the exposition, let us from now on strengthen Assumption [23]to 

h = h{0 = T/i, 

such that the problem is parameterized by ^. Let us assume that optimization is done over a fine grid 

where M G N is arbitrary large but fixed. Now at each time instant s the minimum 

£,t{s) = argmin^gsCT.slO 

is calculated, where Ct,s{C) = Ct,s{T/C)- Here and in the sequel the operator argmin^g^ /(a) for a 
function / : A — )■ M refers to the smallest a € A such that /(a) < /(x) for all x ^ A, thus leading to 
an unique definition. 

We obtain the following corollary. 

Corollary 4.1. Given the conditions of Theorem \4. 3\ 

{TCtAO : e e H} ^ {C^{-) : e e H}, (4.12) 
as T ^ oo, in the product space {D{[so, 1];M))^^. Consequently, 

=^ argmin^g= C^, 

as T ^ oo, in D([so, 1]; M). 
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Proof. The process {TCt^s{C) : ^ G H} is tight, since the coordinate processes {TCt,s(C) '■ s G 
[sq, 1]} are tight for each G S. To check convergence of the fidis, we consider a linear combination 

Ht{s) = Y,hT^QTi^) 

for A^, ^ G H, such that not all vanish. We can represent Ht{s) as 

rlTsl/T ATs\/T /-[Tsl/T 

/ / / D{uT/h,vT/h,wT/h){N^Y^{w)dwdST{u)dST{v) 

J\Tsa\/T J\Tso\/T J\T^\/T ^ 
We have shown in the proof of Theorem 14. 3 1 that for fixed ^ G H 

f\Ts\/T 

g"/{u-i)= / D{uT/h,vT/h,[Tw\/h){N^)-^{w)dwdST{u)ldST{v) 
converges uniformly in n, u G [7, 1] and w > s^to 



as T — 00. Then the triangle inequality shows that J2^eE^£,9T^i'^-iO converges uniformly to 
A^g'"'*(ti; as T — ^ 00. Now we can apply exactly the same arguments as in the proof of 
Theorem 14. 3 1 to obtain the fidi convergence 



{Ht{si),--- ,Ht{sn))^ ding G''^Hv)dB{v)^ 



N 



i=l 



as T — )■ c«, for fixed si, . . . , sat. The same chain of arguments shows that the fidis of X^TLj,{-) 
converge weakly to the fidis of Yl^£- ^uch the fidi convergence of Yl^^- ^(^T, {£,) follows. 

Again, tightness of the linear combination follows easily from the triangle inequality for the Lp norm. 
Since H is a finite set, we immediately obtain that {Ct,s{0 • G H} converges weakly to {C^ : ^ G 
H}, as T — )• 00. But this implies the weak convergence result for the smallest minimizer. □ 



5 AN ANSCOMBE-TYPE THEOREM FOR RANDOM TIME HORI- 
ZONS 

The results of the previous sections assume that monitoring stops latest at the non-random time hori- 
zon T, and the theory is nicely captured by sequential empirical processes being elements of Skorohod 
spaces of functions defined on [0, 1], such as £'([0, 1]; M). Here the unit interval corresponds to the 
physical time interval [0, T]. The limit theorems then provide approximations to the true distribution 
of the sequential processes when T is fixed but large. 
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Let us now assume that the time horizon T is determined by a parameterized family of random 
experiments given by a family {r^ : a > 0} of random variables, frequently stopping times, taking 
values in the natural numbers. This may happen, if, for example, the time horizon is determined as the 
time instant where cumulated costs exceed a threshold for the first time. The question arises whether 
in limit theorems, say for (standardized) sums of T terms, one may replace T, assumed to tend to 
oo, by a family of random variables indexed by a > 0, which behaves as Aa, A a positive constant, 
as a — )• oo, a condition which ensures that Ta tends to oo as a — )• oo, such that one can hope that 
the asymptotics T — )• oo can be replaced by a — oo when replacing T by Tg. This is s ue ha s been 
extensively studied in the literature. Anscombe's seminal paper on this topic. lA nscombe (1952), gave 
sufficient conditions for this to be true. Applied to sums of i.i.d. random variables, his result is as 
follows. 



Theorem 5.1. (Anscombe, 1952) 

Let Xi,X2, . . . be i.i.d. random variables with mean and common variance cj^ G (0, oo) and put 
Sn = X^iLi -^i' n Gf^. Suppose that the family {Ta : a > 0} of random indices satisfies 



— 4 A G (0,oo) 
a 



(5.1) 



a — oo. Then 



as well as 



as a ^ oo. 



S. 



cr^/T, 



^iV(0,l), 
N{0,1), 



Sra d 



ay Xa 



Anscombe's result belongs to the fundamen t al insi ght s on sequential rn ethodologies and can be 



GoshetaL 



(119971) . It is worth mentioning 



found in various monographs such as iSiegmundl (119851) or|) 
that in its basic form it addresses a sequence {Z,Zn} which converges weakly, i.e. Z„ 
n — )• oo. Provided that given e > there exists 5 > and no G N, such that 



Z, as 



P I max 

^{k:\k—n\<n5} 



ZJ>e]<£ 



(5.2) 



a condition called uniform continuity in probability, Anscombe shows that Z^-^ — )■ Z, as a — oo. 
His results have been adopted to many applications and generalized considerably. For example, when 
strengthening ( 15.11 ) to 



aX 



where A ^ < 5^ — 0, as a — oo, then a Berry-Esseen result holds true, that is the distribii t ion of 
Sr^/{aTa) conv erges unifo rmly to the standard normal distribution function, cf. iGosh et al.l (11997 , 



Theorem 2.7.3). lGuti(ll99lh established Anscombe-type laws of the iterated logarithm by strengthen- 
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ing (|5^ to 



n=l 



max I Zk 

{k:\k—n\<n5} 



Zn\ > £ < OO. 



(5.3) 



For fu rt her extensions in t his dir ectio n, e.g., to ^/-statistics, and appUc ations we refer to lGosh and Dasgupta 



(119801) . iMukhopadhyayl (tl98l|), and iMukhopadhyay and VikI (Il985h . amongst others. Finally, it is 
known that Anscombe's central limit theorem stated in Theorem 15. II ex tends to a function a l central 
limit t heor em with B rownian motion as the limit process; we refer to iBillingsleyi (119991) . 
J2OO0I) andlCua (12009), amongst others. 



Larsson 



Particularly having in mind complex applications where concrete definitions of the random time 
horizon may be unknown to the statistician when designing the sequential procedure, it is remarkable 
that the result holds true without any condition on the dependence of the increments of the partial sums 
in Theorem 15.11 i.e. {X„ : n > 1}, and the family of stopping times {tq : a > 0}. Even stopping 
times which analyze the random increments directly can be used without affecting the asymptotic 
normality for a — )• 00. Indeed, a standard example for a family {xa : a > 0} satisfying Anscombe's 
condition (15.11 ) is the first passage time of the random walk related to an i.i.d. sequence Xi,X2, ■ ■ ■ 
with common mean / 0, 

Ta = inf{r en: St > a}, a > 0, 

e.g., costs associated with the continuation of the sequential procedure, where as in the above theorem 
and, with some abuse of the notation used in previous sections. 



5r = ^Xi, TGN. 



Then it is well known that 



i=l 



r a^. ^ _ 1 



Gosh et al. 



(Il997h . 



as a — )• 00, cf. the proof of Lemma 2.9.2 in 

As a second important example let us consider the following sequential estimation setting dis- 
cussed by Anscombe in his 1952 paper. That example also shows that Anscombe's results address a 
deficiency of sequential procedures such as the sequential probability ratio test, namely the fact that an 
open-ended stopping rule which is applied in order to stop sampling as soon as it is possible to decide 
in which subset of the parameter space a paramete r lies may lead to samples sizes which are too small 
for estimation of parameters, cf. the discussion in lSiegmundl (119851 Ch. 5). That early-stopping issue 
can be approached as follows. Aiming at estimating a parameter 6 from the data we sample until an 
estimate of the estimator's dispersion is less or equal some threshold Ca, where Ca i as a 00, 
and then estimate the parameter by an estimator On which is assumed to converge in distribution after 
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standardization. Given the family 



Ta = inf{n G N : s. d.(6l„) < Ca}, a > 0, 

defined in this way satisfies 

Ta/r* — 1, in probability, as a — )■ oo, 

where 



< = inf{n G N : ^ Var < c4, a > 0, 

is the corresponding least sample size such that the true dispersion of the estimator is less or equal than 
Ca, Anscombe shows that the above sequential sampling scheme yields an estimator which inherits 
the asymptotic distribution with the true dispersion replaced by Ca- This means, one may achieve 
estimation with given small accuracy Ca- 

Our interest is now to extend the weak convergence results for the cross-validation criterion to the 
case of a random time horizon. We shall see that the time horizon can indeed be replaced by a family 
of random indices under quite general conditions, but the interpretation differs: By randomizing the 
time horizon in such a controlled way instead of fixing it at a large value, we may ensure certain 
properties, such as a guaranteed accuracy of some estimator of interest, in the case that a (closed- 
end) stopping rule did not lead to a signal before the time horizon. This is particularly beneficial 
when monitoring a time series automatically and expecting a signal indicating a change only with low 
probability, such that the typically outcome is that the procedure runs until time T. Having reached 
the time horizon T, one might be interested in analyzing the sample obtained in this way using classic 
methods of estimation and testing. 

Another motivation is that there may be events which should trigger immediate termination of a 
monitoring procedure. As an example, suppose one monitors the mean of an investment portfolio by 
applying the procedure to the (discounted) value process of the portfolio, in order to get an alarm if 
the investment strategy performs poor. But in case that the associated risk rt, whi ch can be meas ured 



by a dispersion statistics such as the standard deviation or by value-at-risk, cf. iStelandl (120121) . or 
the risk of some other important financial variable exceeds an upper risk limit, one should terminate 
immediately. This gives rise to a family of stopping times such as Ta = inf{n < T' + 1 : Vn > fa}, 
where T' = T or T' = oo, and is the upper risk hmit parameterized by a > 0. 

In what follows, we shall now discuss a random time horizon limit theorem for the cross-validation 
process, which is affected when applying a Anscombe-type random stopping procedure to the time 
horizon of the detectors and defined in ( 12.41 ) and ( 12.51 ). respectively. However, it will turn out 
that the arguments go through for many other processes as well. 

Recall that the cross-validation process CT,s{h) is dominated by the process Lt{s) and satisfies 

Ct{s) = TCt{s) C^is), 



22 



as T — )• oo. We are interested in the randomly stopped sequential processes 
and 

The following main result of this section provides an Anscombe-type theorem for Cr^ . Its proof 
is based on the key observation that in our setting the random stopping can be interpreted as a random 
change of time. 

Theorem 5.2. Let {Ta : a > 0} be a family of random variables taking values in N such that condition 
( 15. il ) holds true for some A < 1. Then the process {C^-^ : a > 0} with random time horizon Ta satisfies 
the functional central limit theorem 

as a ^ OO, in D{[so, 1]; M). Further, ifE = {^^, . . . , ^^^} as in Subsection[ 



C^{s) argmin^gs C^{\s), (5.4) 
as a ^ oo, in D{[so, 1]; M), provided the assumptions imposed there hold true. 



Proof The proof draws on iBillingsleyl (119991) . but our setting differs slightly. For constants A < B 
andC < D let Doi[A,B]; [C,D]) denote the set of those elements / of D{[A,B]; [C,D]) that 
are nondecreasing and satisfy C < f{t) < D for all t; C{[A,B]; [C,D]) is defined accordingly. 
Introducing the parameter T' = \a\, a > 0, we can embed C-r into the sequence {Cy : T' > 1} of 
processes via the crucial identity 

CrAs) = Ct' (^s) , sG[so,l], T'eN, 

that is 

Cr^=CT'O^T', T'gN, (5.5) 



where 



^T'{s) = ^s, s€[so,l],T' en. 

Notice that ^ — A, as a — >• oo, and A < 1 imply that $t/ takes values in C([so, 1]; [Asq — e, 1]) for 
large enough T', given any arbitrary small e > 0. The result now follows easily from the representa- 
tion (15.51 ). We have the joint weak convergence 



(Ct',^t')^(A,^), 

as T' — )- oo, in the product space D{[Xso—£, l];M.)^Do{[so, 1]; [Xsq—e, 1]), where $ G Co([so, 1]; [Asq, 
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is the multiplication with A, i.e. <I>(s) = As for s G [sq, 1]. Indeed, ^t' converges weakly to the non- 
random continuous element <I>, since by virtue of Anscombe's assumption (15.11 ) 



sup \^T'{s) - ^{s)\ < 



^0, 



as a — )• oo, which implies =^ as T' — oo. Thus the result follows by an application of 
the continuous m apping theorem, since the composition of mappings is a continuous functional, cf. 



Billingsleyl (| 19991 . p.l51), and we can conclude that 

as a — )• 00. The proof of (I5.4l i is left to the reader. □ 
Remark 5.1. The above result and its method of proof deserve some discussion. 

(i) An inspection of the proof of Theorem |5 . 2 I re veals that the arguments carry over to any empirical 
process Xt{s), particularly partial sum processes, such that the (functional) dependence on T 
and s is via multiplication Ts. 

(ii) By either restricting the domain [sq, 1] to [sq, 1/A] or taking the natural extension of the limit 
theorems of the previous section to the spaces D{[A, B];W) for [j4,i?] C [so,oo), one may 
easily generalize the result to an arbitrary limit A G (0, 00) of r^/a. 

(iii) The proof relies on the joint convergence of the process of interest, Ct', and the transformations, 
^T' > which holds true if converges to a non-random function. The latter is guaranteed 



by condition ( 15.11 ). which already appeared in lAnscombel (119521) . However, the more general 
condition 

^ 4 A, (5.6) 

a 

as a — )• 00, for some random variable A, requires an explicit proof of the joint weak conver- 
gence. This may require much more knowledge on Ct' , the definition of and the dependence 
between both. Only in the case that {Ct' '■ T' > 1} and {tq : a > 0} are independent, the joint 
weak convergence again follows. 

Our discussion suggests to formulate the following corollary for the important special case that 
the random experiment conducted to determine the time horizon is independent from the observations, 
in order to extend the scope of our results to families of stopping times satisfying (15.61 ). 

Corollary 5.1. Let {tq : a > 0} be a family of random variables taking values in N. If {Ta : a > 0} 
is independent from {Xxn '■ ^ < n < T ,T > 1} and satisfies 

Ta d . 

^ A, 
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as a ^ oo, for some random variable A taking values in (0, 1], then the randomly stopped process 
Cr^ satisfies 



Jo Jo - W dv 



as a ^ oo. Further, given the assumptions imposed in Subsection \4.3\ then 

CrS^) argmin^g=£^(As), 
as a oo, in D{[so, 1]; M), where E = {^^ . . . , ^*^}. 
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